6 research outputs found
Collaborative ranking from ordinal data
Personalized recommendation systems have to predict preferences of a user for items that have not seen by the user. For cardinal (ratings) data, personalized preference prediction has been efficiently solved over the past few years using matrix factorization related techniques. Recent studies have shown that ordinal (comparison) data can outperform cardinal data in learning preferences, but there has not been much study on learning personalized preferences from ordinal data. This thesis presents a matrix factorization inspired, convex relaxation algorithm to collaboratively learn hidden preferences of users through the multinomial logit (MNL) model, a discrete choice model. It also shows that the algorithm is efficient in terms of the number of observations needed
Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity
While deep learning (DL) models are state-of-the-art in text and image
domains, they have not yet consistently outperformed Gradient Boosted Decision
Trees (GBDTs) on tabular Learning-To-Rank (LTR) problems. Most of the recent
performance gains attained by DL models in text and image tasks have used
unsupervised pretraining, which exploits orders of magnitude more unlabeled
data than labeled data. To the best of our knowledge, unsupervised pretraining
has not been applied to the LTR problem, which often produces vast amounts of
unlabeled data.
In this work, we study whether unsupervised pretraining of deep models can
improve LTR performance over GBDTs and other non-pretrained models. By
incorporating simple design choices--including SimCLR-Rank, an LTR-specific
pretraining loss--we produce pretrained deep learning models that consistently
(across datasets) outperform GBDTs (and other non-pretrained rankers) in the
case where there is more unlabeled data than labeled data. This performance
improvement occurs not only on average but also on outlier queries. We base our
empirical conclusions off of experiments on (1) public benchmark tabular LTR
datasets, and (2) a large industry-scale proprietary ranking dataset. Code is
provided at https://anonymous.4open.science/r/ltr-pretrain-0DAD/README.md.Comment: ICML-MFPL 2023 Workshop Ora